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ABSTRACT 

In recent years there have been many reported detections of highly redshifted or blueshifted 
narrow spectral lines (both emission or absorption) in the X-ray spectra of active galaxies, but 
these are all modest detections in terms of their statistical significance. The aim of this paper 
is to review the issue of the significance of these detections and, in particular, take account 
of publication bias. A literature search revealed 38 reported detections of narrow, strongly 
shifted (v/c > 0.05) X-ray lines in the 1.5-20 keV spectra of Seyfert galaxies and quasars. 
These published data show a close, linear relationship between the estimated line strength 
and its uncertainty, in the sense that better observations (with smaller uncertainties) only ever 
show the smallest lines. This result is consistent with many of the reported lines being false 
detections resulting from random fluctuations, drawn from a large body of data and filtered by 
publication bias such that only the most ' significant' fluctuations are ever reported. The reality 
of many of these features, and certainly their prevalence in the population at large, therefore 
remains an open question that is best settled though uniform analysis (and reporting) of higher 
quality observations. 
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1 INTRODUCTION 

The arrival in recent years of large quantities of X-ray data from 
CCD and grating spectrometers has provided a vast increase in 
the amount of spectral information available to X-ray astronomers. 
These data have vastly improved our understanding of the X- 
ray properties of active galaxies such as Seyfert galaxies and 
quasars. Among the discoveries to have been made using these 
data were narrow line-like features in emission or absorption at 
une xpected energies in the canonical 2-10 keV X-ray band (see 
e.g. lYaaoob et ailJ 1 19991: iTurner et alll2002l: IPounds et alJl2003d: 
Porquet et al.l l2004lMatt et al.ll2005l ; lLonginotti et al.ll2006l : ICappil 
2006). In many cases these have been identified with transitions 
in iron that have been strongly redshifted or blueshifted out of the 
usual iron line band (6.4 - 6.9 keV) by high bulk flow velocities 
(v > 0.05c) or gravitational effects. Yet they are narrow and line- 
like, indicating low velocity dispersion. The narrowness and highly 
shifted centroid energies of these features mark them as distinct 
from the relativi stically broadened emission lines seen in Sey fert 
galaxies (see e.g. lFabian et alj|200 0: Reynol ds & Nowakll2003l . for 
reviews). The extreme nature of these features means they have pro- 
found implications for the structure and energetics of the nucleus 
JCappill2006h . 

Irrespective of their proposed physical importance the re- 
ported features are all rather modest detections, in the sense that 
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the statistical significance is not outstanding in any single case. 
Indeed, the vast majority of reported cases appear to lie in the 
"2-3cr" regim^j. The detection process usually involves modelling 
the featureless continuum emission of the target object and search- 
ing for large, localised, positive or negative residuals between the 
data and model (spectral lines may appear in emission or absorp- 
tion); the larger the contribution of the residuals to the fit statistic 
(e.g. the change in x~ upon including a line in the fitted model) 
the more significant the feature. The wide bandpass and good spec- 
tral resolution of modern detectors, combined with the number of 
datasets that have been processed mean that data archives for recent 
spectroscopy missions will contain many modest signal-to-noise 
"features" simply from random sampling fluctuations in the pho- 
ton counting signal from otherwise featureless continua. For exam- 
ple, the Tartarus databasqj contains some 661 ASCA observations 
of active galaxies, and the XMM-Newton Science Archive (XS^j) 
contains over 1,000 publically available "Guest Observer" obser- 
vations listed under the proposal category "AGNs, QSOs, BL Lacs 
and XRB". The number of spectra is arguably even higher than 
this because each of the longer observations is often divided into 



1 Usually this means the outcome of an hypothesis test was a p-value in 
the range from ~ 5 X 10~ 2 down to ~ 3 X 10~ 3 . Such results are usually 
reported as "detected at 95 — 99.7 per cent confidence" or given in units of 
cr by comparison with the tail area under the Normal curve. 



http : //astro . ic . ac . uk/Research/Tartarus/ 
' http : //xmm. esac . esa . int/xsa/index . shtml 
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multiple spectra as a function of time, source flux, and so on, and 
indeed many of the line detection papers report the features to be 
"transient" (i.e. detected in only a subset of the data). 

The large number of moderately significant detections (see 
sect. [2]l might be considered as evidence to support the reality of 
the features. However, when considering results presented for indi- 
vidual datasets drawn from a much larger population of available 
data one should be aware of the distorting effects of "publication 
bias," also known as the "file-drawer effect" - the tendency for 
positive results to be publ ished and nega ti ve results to go unre- 
ported ("filed away") . See I Sterling! dl959l) . lRosenfhall < 19791) and 
iBegg & Berlin ( 1988) f or gen eral discussion of publication bias, 
and also Stern & Simes ( 1997) and Naylor ( 1997) for a more recent 
discussion of the importance of publication bias in the context of 
medical trials. One way to test for the presence of publication bias 
is through a "fun nel plot," or iginally proposed to aid meta-analyses 
of medical trials (Egger et al. 1997), which compares the size of a 
triaQ to its estimate of the strength of the effect. All estimates of 
the strength of an effect should be symmetrically scattered around 
the true value, with smaller trials providing less precise estimates 
and so larger scatter. In the absence of publication bias this will re- 
sult in a symmetric funnel-shaped plot because the estimates of the 
effect strength are independent of the sample sizes, but the scatter 
is larger for smaller samples. If publication bias is present, experi- 
ments or observations are less likely to be published if the estimate 
of the effect strength is low (or of low significance), i.e. the bias is 
against publishing non-detections, leading to an asymmetric funnel 
plot in which the strength of the effect is correlated with the sample 
size. The funnel plot produced from biased literature is the same as 
from the equivalent unbiased literature but with less "interesting" 
results (i.e. less significant or less strong results, which lie on one 
side of the funnel) systematically removed. 

This paper describes a simple meta-analysis of the published 
detections of highly shifted, narrow X-ray lines in active galax- 
ies using a funnel plot-like analysis. The conventional funnel plot 
would not be appropriate in the present context because there is 
expected to be considerable intrinsic heterogeneity in the strength 
and properties of the shifted lines, which means there is no single, 
true value for the strength of the "effect." But the principle of the 
funnel plot should still hold: the estimated strengths of the lines 
should be independent of the quality of the data used to find them 
(in this context quality means essentially the signal-to-noise of the 
data, which is of course closely related to the size of the photon 
sample that constitutes the spectrum). The strength of a real line 
should be independent of the exposure time and detector sensitivity 
used to measure it. 



2 ANALYSIS 

The starting point for the meta-analysis was a search for published 
claims of highly shifted, narrow, emission or absorption lines in the 
X-ray spectra of Seyfert galaxies and quasars. For the purposes of 
the present study, these features are defined as intrinsically narrow[f| 
emission or absorption features found in 1 .5 - 20 keV X-ray spectra 



The size of the sample used in the study; larger trials, i.e. those with larger 
sample sizes, tend to give higher signal-to-noise results. 
5 Narrow in this context is defined to mean unresolved or marginally re- 
solved in the data, and with <x < 0. 1 keV. The criterion was used to filter 
out broad absorption troughs, blends and photoelectric edges 



of Seyfert galaxies or quasars, that have been identified with promi- 
nent transitions in the X-ray band (e.g. Ko- lines of Fe, Ca, Ar, S, Si, 
Mg) leading to inflow/outflow velocities v/c > 0.05. In the partic- 
ular case of iron, emission lines were accepted if outside the range 
6.1 - 7.3 keV (corresponding to 6.4 - 6.9 keV Ka lines from Fe 
i - xxvi at v/c = ±0.05), and absorption lines if outside the range 
6.4 - 7.3 keV (corresponding to 6.7 - 6.9 keV resonances in H and 
He-like Fe xxv-xxvi). The slightly different ranges for absorption 
and emission correspond to the different species expected to domi- 
nate in each case. This is criterion provides a simple and uniform, 
albeit arbitrary, way to distinguish between the highly shifted, iso- 
lated, narrow line features that are the focus of this paper and more 
mildly shifted structures likely to be more directly linked to a broad 
emission (or absorption) complex centred around 6.4-6.9 keV. The 
main result and conclusion of the paper would not be significantly 
changed if the v/c threshold was increased (e.g. to v/c > 0.1). 

An initial list of papers was constructed from all refereed ar- 
ticles listed in the NASA Astrophysics Data System (ADS|3) pub- 
lished between 1995 and 2007 (inclusive), selecting papers with 
abstract text that matched the Boolean expression "narrow and X- 
ray and line and (redshifted or blue shifted)." The resulting 135 pa- 
pers were then examined individually to select only those that re- 
ported new detections of the type of features under investigation. 
This provided a list of 12 such papers. By following their "paper 
trails" (citations to/from the articles) it was possible to add a fur- 
ther 14 papers, yielding a total of 26 papers presenting detections 
38 shifted, narrow lines. Table \T\ lists all the line features found 
through this literature search. 

The X-ray absorption systems reported i n the gravitationally 
lensed Broa d Absorption Line (BAL) quasars JChartas et alj|2002l 
I2003L 12007 al ibi) were treated separately. In every published BAL 
case at least one component of the absorption system was re- 
ported as resolved and broad, and so did not match the criterion 
above. Also, gravitationally lensed BAL quasars arguably represent 
a rather special sample of objects within which there known high 
velocity absorption systems, and so there are good reasons to treat 
these objects as distinct from the sample of Seyferts and non-BAL 
quasars. For completeness these are included in TableQ] but are not 
considered in the discussion that follows. 

The strength of the feature reported by Gallo et al. (2005J) was 
given in both equivalent width (EW) and photon flux terms but the 
relative uncertainties stated for each are different: 13 and 55 per 
cent, respectively. Given the modest effect of this feature on the fit 
statistic (Ax 2 = 8.1), the larger of these two uncertainties would 
appear to be the more plausible, and this value is used in Table [T] 
although it should be noted that the conclusion of the present pa- 
per does not dep end on this one val ue. In the cases of Mrk 766 
hunter etalj|2004l) and NGC 3516 ^Turner et alj|2002l) , the line 
strengths were given only in flux terms. In order to provide a bet- 
ter comparison with the other lines these were converted in EW 
terms using the flux density of the continuum at the location of the 
lines, found by fitting the relevant datqj. The 5.9 keV abso rption 
line in NGC 3516 has no published EW (Nandr aet alii 19991) . al- 



6 http://adswww.harvard.edu/ 

7 The XMM-Newton data for Mrk 766 were obtained through the XSA, 
processed with SAS v7.1.0, and EPIC pn spectra for "high" and "low" pe- 
riods were extracte d from the first 100 ks and last ~ 30 ks, as described by 
iTurner et all (2004). The 3-11 keV spectra were then fitted with a power 
law model, excluding the 5-7 keV band (following lTurner et a l. 2004). The 
Chandra HETGS data for NGC 3516 were obtained through the HotGAS 
database at http : //hotgas . pha . jhu . edu/ and the HEG (3-9 keV) and 
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Table 1. Data and sources used in the meta-analysis. The columns list the following information: (1) source name, (2) redshift, (3) the exposure time of the 
observation, (4) centroid energy, (5) equivalent width (EW) and (6) its 90 per cent uncertainty for the line detections, (7) the stated improvement in the fit 
statistic due to the line, and (8) the corresponding reference. Negative EWs indicate absorption lines. The 'f' symbol indicates the line strength was given only 
in photon flux (10~ 5 ph s cnT 2 ) terms, but converted into EW (see text). Column (3) also indicates the X-ray mission: X (XMM- Newtori), C (Chandra), A 
(ASCA), B (Beppo SAX). T he lensed BAL quasars (see text) are listed separately at the bottom of the table. The results from lPounds et alJ 120051) replace those 
from lPounds et alj J2003al) . 



Target 


z 


T 

1 exp 


E 


EW 


err(EW) 


4r 2 


reference 


name 




fks) 


(keV) 


(eV) 


(eV) 






(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(V) 


(8) 



PKS 0637-75 


0.653 


49(A) 


1.6 


-58 


36 


14.6 


Yaqoobetal. (1998) 


PG 1211 + 143 


0.081 


50 (X) 


1.63 


-14 


3 


32 


Pounds et al. (2003a, 2005) 


4U 1344-60 


0.013 


26 (X) 


1.63 


19 


11 


- 


Piconcelli et al. (2006) 


PG 1211 + 143 


0.081 


50 (X) 


2.94 


-36 


9 


26 


Pounds et al. (2003a, 2005) 


PG 0844+349 


0.064 


20 (X) 


3.02 


-35 


16 


7 


Pounds et al. (2003b). but see (Brinkmann et al. 2006) 


NGC4151 


0.003 


69 (X) 


3.70 


10 


5 


15.6 


Nandra et al. (2007) 


PG 1211 + 143 


0.081 


134 (C) 


4.22 


-35 


16 


13.8 


Reeves et al. (2005) 


Mrk841 


0.036 


30 (X) 


4.80 


50 


20 


11 


Petrucci et al. (2007) 


4U 1344-60 


0.013 


26 (X) 


4.9 


45 


23 


- 


Piconcelli et al. (2006) 


PG 1211 + 143 


0.081 


134 (C) 


4.93 


-57 


23 


19.9 


Reeves et al. (2005) 


NGC4151 


0.003 


32 (X) 


5.23 


8 


4 


18.9 


Nandra et al. (2007) 


4U 1344-60 


0.013 
0.162 


26 (X) 
94 (X) 


5.3 
5.34 


57 
-75 


28 
37 


- 


Piconcelli et al. (2006) 


Q0056-363 


Matt et al. (2005) 


ESO113-G010 


0.026 


4(X) 


5.38 


265 


90 


9.6 


Porquet et al. (2004) 


Mrk 509 


0.034 


33(B) 


5.45 


-173 


146 


8.7 


Dadina et al. (2005) 


PG 1416-129 


0.129 


50 (X) 


5.5 


194 


89 


12.8 


Porquet et al. (2007a) 


Mrk 509 


0.034 


33(B) 


5.5 


-195 


83 


16.8 


Dadina et al. (2005) 


NGC3516 


0.009 


75(C) 


5.57 


23-' 


4' 


- 


Turner et al. (2002) 


Mrk 766 


0.013 


130 (X) 


5.60 


18-f 


9/ 


12 


Turner et al. (2004) 


ESO 198-G24 


0.046 


7(X) 


5.7 


70 


40 


9.3 


Guainazzi (2003): Bianchi et al. (2004) 


Mrk 766 


0.013 


130(X) 


5.75 


56' 


2p 


13 


Turner et al. (2004) 


NGC7314 


0.005 


97(C) 


5.84 


32 


16 


- 


Yaqoob et al. (2003) 


NGC3516 


0.009 


152 (A) 


5.9 


-30 


- 


28.3 


Nandra et al. (1999) 


Mrk 335 


0.026 


30 (X) 


5.92 


-50 


21 


16 


Longinotti et al. (2007) 


Ark 120 


0.033 


57 (X) 


6.01 


-21 


10 


21.3 


Nandra et al. (2007) 


NGC 3227 


0.004 
0.009 


100 (X) 

57 (X) 


6.04 
6.08 


21 
-40 


9 


24.9 


Markowitz & et al. (2008) 


NGC3516 


Bianchi et al. (2004); Dovciak et al. (2004) 


E1821+643 


0.297 


100 (C) 


6.2 


-54 


13 


- 


Yaqoob & Serlemitsos (2005) 


NGC 4151 


0.003 


69 (X) 


7.33 


-15 


8 


16.8 


Nandra et al. (2007) 


NGC 4151 


0.003 


32 (X) 


7.45 


-16 


6 


28.0 


Nandra et al. (2007) 


RXJ0136.9-3510 


0.289 


195 (A) 


7.6 


860 


401 


11.0 


Ghosh et al. (2004) 


PG 1211 + 143 


0.081 


50 (X) 


7.61 


-105 


35 


32 


Pounds et al. (2003a. 2005) 


IC 4329A 


0.016 


69 (X) 


7.68 


-15 


7 


21.0 


Markowitz et al. (2006); Nandra et al. (2007) 


MCG-5-23-16 


0.008 


96 (X) 


7.7 


-33 


10 


20 


Braito et al. (2007) 


UGC 3973/Mrk 79 


0.022 


4(X) 


7.99 


161 


89 


8.1 


Galloetal. (2005) 


Mrk 509 


0.034 


33(B) 


8.14 


-383 


150 


16.4 


Dadina et al. (2005) 


PG 0844+349 


0.064 


20 (X) 


8.18 


-170 


60 


11 


Pounds et al. (2003b), but see (Brinkmann et al. 2006) 


PKS 2149-306 


2.345 


20(A) 


17.0 


298 


204 


10.3 


Yaqoobetal. (1999) 


Lensed BAL quasars 


PG 1115+080 


1.72 


63 (X) 


7.38 


-140 


60 


- 


Chartas et al. (2003) 


APM 08279+5255 


3.91 


89(C) 


8.05 


-240 


70 


35.2 


Chartas et al. (2002) 


H 1413+117 


2.56 


89(C) 


8.5 


- 


- 


- 


Chartas et al. (2007b) 


PG 1 1 15+080 


1.72 


63 (X) 


9.50 


-1400 


500 


- 


Chartas et al. (2003) 


APM 08279+5255 


3.91 


89(C) 


9.79 


-430 


150 


40.2 


Chartas et al. (2002) 


H 1413+117 


2.56 


89(C) 


13.9 


-2400 


1600 


- 


Chartas et al. (2007b) 



though spectral fitting of the publically available data yielded an 
estimate of 30 e"Vrl The EW of the 6.08 keV emission line in 



MEG (3 — 6.5 keV) spectra fitted simultaneously with a power law model 
after excluding the 5-7 keV interval. 

8 The data were obtained from the Tartarus database at 
http://astro.ic.ac.uk/research/tartarus/ and fitted over 



N GC 3516 was n ot stated bv lBianchi et alj J2004I) but the analysis 
of Dovciak et al. (2004) would seem to suggest it is about ~ 40 eV. 
Both the 5.9 keV and 6.08 keV lines in NGC 3516 were published 
without estimates of the uncertainty on their strengths. 



the 3-10 keV range with a power la w plus Laor diskline model, and a 
Gaussian absorption line, as discussed in lNandra et alj (1999). 
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3 RESULTS 

Excluding the lensed BAL quasars, the compiled data list 36 lines 
from 23 sources with EWs and 90 per cent uncertainties^ which are 
indicators of the "signal" and "noise" respectively (similar to the 
"effect strength" and "sample size" of the funnel plot). Of these, 17 
are for emission lines and 19 from absorption lines. Figure Q] shows 
a scatter diagram for these two quantities. This diagram serves the 
same purpose as the funnel plot, showing whether the strength of 
the measured effect (EW) depends on the quality of the data (as 
indicated by the uncertainty on EW). If most of the lines are real, 
some observations should populate the upper left portion of the di- 
agram. The zone of avoidance in the lower right arises from the 
fact that any line with a 90 per cent confidence interval on its EW 
that includes (or extends very close to) zero, would probably not 
be reported as a detection. Despite two decades of range in the line 
strengths there is a clear trend for all the data points to lie close to 
the edge of the zone of avoidance, i.e. just above the detection limit, 
irrespective of the line strength. If the lines do indeed span this 
range in strengths, the strongest lines should be easily detectable 
in the best observations (i.e. those with smallest uncertainties), but 
only weak features are claimed in all these cases. It would appear 
that the strength of any narrow, relativistically shifted lines depends 
on the quality of the data they were detected in. 



4 DISCUSSION 

The tendency for stronger lines to be accompanied by proportion- 
ately larger uncertainties (as shown in FigureQJ, or equivalently the 
relatively constant \EW\I error ratio over the large range in \EW\, 
requires explanation. The lines with larger (absolute) EW should 
be easy to detect in more sensitive observations (and should give 
smaller uncertainties) and so should populate the upper-left region 
of FigureQ] but this is not the case. One simple explanation is that 
many of the line detections are the most 'significant' false detec- 
tions from a large population of data covering a wide range of 
power to detect potential lines. But this begs the question of why 
are there so many false detections? 

Examination of a spectrum in isolation may or may not pro- 
vide a false detection, but an apparently stronger detection will be 
more strongly favoured for publication. The many hundreds of ob- 
servations that have been examined and not provided detections 
(whether with strong or weak limits) would populate the shaded 
region of Fig. Q] if only they were published (very few observa- 
tions are published with limits on the strength of undetected nar- 
row, shifted lines). The reported detections may be in effect the "tip 
of the iceberg" - the strongest or most significant of a population 
of random fluctuations, with the rest of the population unseen due 
to publication bias. If the lines are genuine, the challenge is to ex- 
plain why all the detections are close to the detection limit despite 
the huge range in the quality of the data, or equivalently, why the 
largest (absolute) EW lines appear only in the poorest data with the 



9 As far as can be ascertained, all the confidence regions were calculated by 
varying the EW parameter until the observed fit statistic (x 2 or C-statistic) 
increased by 2.706 over its minimum. In cases where the confidence interval 
was roughly symmetric about the best fit, the half width of the stated 90 per 
cent confidence region was taken as a single estimate of the uncertainty. In 
the case of highly asymmetric intervals, the part of the interval extending 
below the best fit (i.e. towards zero EW) was used. 



largest error bars. Of course it is plausible that Fig.Q]stiows a mix- 
ture of false and true detections, with the true detections expected 
mostly among the weaker lines (e.g. \EW\ < 30 eV) since they are 
necessarily limited to moderately weak detections even in the best 
data (which is not true of stronger lines). 



4.1 Detection methods 

Virtually all these lines were justified on the basis of p-values from 
hypothesis tests being smaller than some threshold a, a process in- 
tended to limit the fraction of false detections (Type I errors) to 
a, and in almost all the cases listed above a reasonable detection 
significance level (e.g. a = 0.05 or 0.01) was used. Perhaps the 
p-values were systematically underestimated, leading to an abun- 
dance of false detections? 

Estimating the statistical significance of a pos s ible line feature 
is indeed rather difficult (see lFreeman et al.lll999l ; |Protassov et al.l 
2002), and many authors have used and continue to use inappro- 
priate tests (e.g. the F-test) which may inflate the number of false 
detections. Ideally, a thorough and uniform analysis of all the data 
would solve the problem that different data analysis techniques 
were used by different authors, but that is not the purpose of the 
present paper, which simply makes use of the published results as 
they are presented in the literature. However, even a uniform and 
comprehensive analysis of all the datasets listed in Table Q] using 
sophisticated statistical methods, would not change the basic re- 
sult (the clustering of lines near the diagonal in Figure Q3 unless 
it was true that many of the uncertainties listed in the table are 
greatly exaggerated. If the re-calculated uncertainties were often 
much smaller the typical \EW\I error would be much higher, and 
the tendency for strong lines to have proportionately larger uncer- 
tainties might be eroded - the points would fill more of the top-left 
region of the figure rather than skirting the diagonal. But there is no 
clear reason to suspect this might be true. If a re-analysis found the 
claimed significances (e.g. the p-values used for detection) were 
slightly too high or too low this may affect the number of points 
in the figure, but not the trend it reveals. If the p-values were sys- 
tematically far too small (i.e. too significant) there must be a large 
number of false detections present in the figure, whereas if the p- 
values are reliable (or even overestimated) the \EW\ - error rela- 
tion still requires an explanation. Therefore, in the remainder of this 
discussion it is assumed that the statistical tests used in the papers 
listed in Table Q] are sound, and that it is still necessary to seek an 
explanation for the relation shown in Figure Q] 



4.2 Confounding factors 

The observed effect could be produced from genuine line detec- 
tions if the two variables, line strength and its uncertainty (i.e. data 
quality), were correlated with some other factor, such as source 
distance. Perhaps the more distant, luminous quasars, that often 
yield the poorest spectra, possess intrinsically stronger line features 
compared to the nearby, low-luminosity Seyferts that have high- 
quality spectra available. Indeed, there is a correlation between 
redshift and EW (Spearman rank-correlation coefficient p = 0.62, 
p = 5.6x 10~ 5 ), but it is much weaker than between \EW\ and its un- 
certainty (p = 0.96, p < 10~ 15 ), and is due entirely to the four low- 
est redshift sources (NGC 4151, NGC 7314, NGC 3516 and NGC 
3227 at z < 0.01), which all have long exposures and showed only 
very weak lines. If these sources are ignored (leaving 28 lines) there 
is no significant correlation between redshift and \EW\ (fi = 0.33, 
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Figure 1. The "signal-error" diagram for narrow, relativistically shifted iron lines compiled from the data given in Table[T] The diagram shows the estimated 
line strengths (absolute value of EW) against the uncertainty in the estimate (half width of the stated 90 per cert confidence interval). The shaded region 
indicates the zone of avoidance within which a feature would not be reported as "detected," i.e. EW < err(EW). Open circles indicate absorption lines, filled 
squares indicate emission lines, and open triangles represent absorption systems from lensed BAL quasars (see text). The 5.6 keV emission line in NGC 3516 
is marked with an arrow to indicate its systematically underestimated uncertainty (see text). 



p = 9.0 x lfr 2 ) or its uncertainty (p = 0.24, p = 0.22). The tight, 
linear relation between \EW\ and its uncertainty involves all sources 
and so cannot be due to the EW (or its uncertainty) being correlated 
with redshift. Also, as can be seen from Table [T] the strong lines 
from ESO 113-G010, ESO 198-G24 and Mrk 79 have large er- 
rors because the observations were very short (< 8 ks), not because 
the targets are intrinsically faint. In principle longer observations 
of these strong lines should conclusively demonstrate the reality of 
these features. However, a su bsequent, much longer (100 ks) obser- 
vation of ESO 113-G010 bv |Porquetet~aT1 J2007bl) did not detect 
the 5.4 keV emission line previously found in a ~ 4 ks exposure 
dPorquet et a l. 2004). If real, the EW must have decreased tenfol d 
between observations to avoid detection by Porq uet et al.l (2007b); 
either a coincidence or an indication that the original detection of a 
strong line was false. 

Another possible confounding factor is line energy. The con- 
tinuum spectrum of the active galaxies listed in Table Q] is usually 
well described by a power law with a photon index typically in 
the range T ~ 1.5 — 2.5, meaning there are far fewer photons at 
higher energies than lower energies. If narrow lines appearing at 
higher energies (e.g. blueshifted Fe Kor at £ > 7.1 keV) tended 
to have higher \EW\ than those at lower energies (e.g. redshifted 
Fe Kor at E < 6.2 keV), the strongest lines may have the largest 
uncertainties simply because they occur preferentially at higher en- 
ergies. There is very little correlation between line energy and \EW\ 
(p = 0.27, p = 0.11) or its uncertainty (p = 0.26, p = 0.13). Indeed, 



Table Q] shows that some of the smallest EWs (and uncertainties) 
were found in lines above 7.1 keV, and the tight relation between 
strength and uncertainty is present in the subset of higher-energy 
lines. It is not clear what other source property could be the con- 
founding factor needed to explain the absence of strong lines in the 
best observations. 

There is one other point worth emphasizing about the line en- 
ergies. Of the 38 lines listed in TableQ] 14 were found in the range 
5-6 keV. The excess of detections could indicate that modestly 
redshifted (z ~ 0.05 - 0.3) emission or absorption are more signifi- 
cant or robust than the more strongly shifted features, i.e. the num- 
ber of detections is enhanced because there are more 'true' lines 
in this band. However, the findings of this paper remain unchanged 
if one considers only the 13 lines in the 5-6 keV range with er- 
rors: these lines follow the same \EW\ - error correlation as shown 
in FigQ] and the \EW\I error is not higher for this subsample. In- 
deed, the mean \EW\j error = 2.4 for the 5-6 keV subset, and 
2.5 for the remaining lines. (Excluding the 5.57 keV line in NGC 
3516, for reasons discussed below, the 5-6 keV subsample gives a 
mean \EW\I error = 2.1.) Even for this subsample the problem re- 
mains to explain the tight, linear correlation between \EW\ and its 
uncertainty. There are at least two possible effects that might help 
explain an enhanced rate of false detections in this narrow band. 
The first is confusion with emission structure from a strong, broad 
(possibly asymmetric) line centred around 6.4 keV. Such emission 
may produce an excess of counts in the 5-6 keV region, above 
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the expected continuum, and so enhance the appearance of line- 
like residuals. The second effect is observer bias; it is plausible that 
individual observers preferentially attend to residuals in the 5-7 
keV region immediately around expected Fe structures. Very few 
of the papers listed in Table [T] report that the detections were made 
on the basis of a systematic a nd uniform search ove r a wide spectral 
range (a notable exception is iNandra et al.l (2007)). Either or both 
these effects may enhance the number of false detections made in 
the 5-6 keV range. 



4.3 The strongest individual cases 

Of the 36 lines listed in Table [T] (ignoring the BALs) with strength 
and uncertainty estimates, only five have strengths that are greater 
than three times their uncertainty (half width of the 90 per cent 
confidence interval). These are: em ission at 5.57 keV in NGC 
3516 (EW ~ 23 eV; lTurneret"al]|2002h. absorption at 6.2 keV 
in E1821+643 (EW « -54 eV; lYaqoob & Serlemitsosll2005h, ab- 
sorpti on at 7.7 keV in MCG-5-23-16 (EW » -33 eV: iBraito etal] 
I2007T) , 1.63 keV and 2.94 keV absorption in PG 1211 + 143 (EW « 
-14 eV and -36 eV; |Pounds etal . 2003a, 2005). These are consid- 
ered in turn below. 

The emission line in NGC 3516 was found in Chandra 
HETGS data, but could not be detected in the partly simultane- 
ous XMM-Newton data, those data gave an upper limit on the line 
flux an order of m agnitude smaller than the Chandra detection 
dTurner et al.ll2002i) . If real, the line flux must have decreased by 
at least one order of magnitude between the observations. Further- 
more it should be noted that the line energy was held fixed during 
the evaluation of t he confidence inter val on the flux (see the foot- 
note to Table 1 of iTurner et alj|2002f) . As the line energy was not 
predicted but obtained from fitting the data, it should have remained 
a free parameter throughout the calculation of confidence intervals, 
otherwise the confidence interval may be artificially re duced. In any 
event, the procedure described bv lTurner et alj J2002I) differs from 
the standard procedure adopted in the other cases and so is perhaps 
best considered a lower limit on the size of the confidence region 
of that particular line (as indicated in FigureQJ. 

The absorption line found in the Chandra observation of 
El 82 1+642 was seen in the HEG spectrum but could not be 
confirmed in the lower signal-to-noise MEG data. In the case 
of PG 1211 + 143 the simultaneous detection of multiple lines, 
and their identification at similar blueshifts, in both CCD (EPIC) 
and grating (RGS) data from XMM-Newton dPounds et al]|2003af) 
would seem to put this case on firmer ground. For complete- 
ness, it should be noted that the identification of the lines 



in terms of highly bl u eshifted features has b een debated, e.g. 
McKe rnanet all [2005); K aspi & Behaj J2006l):Jpounds & Reeves! 
(2007); Ree ves et alT c2008). Ree ves et all ( J2005I) presented Chan- 
dra grating data of the same object and again reported absorption 
lines, except redshifted not blueshifted. The 7.6 keV absorption line 
found in the XMM-Newton data was not detected in more recent 
Suzaku data (with an upper limit ~ 4 times sma ller than the origi- 
nal XMM-Newton detect ion; iReeves et al]|2008h. 

IBraito et al] J2007T) and IReeves et al.l d2007h studied MCG- 
5-23-16 using simultaneous XMM-Newton, Chandra, Suzaku and 
RXTE observations. The detection of 7.7 keV absorption is based 
on the EPIC pn spectrum from XMM-Newton. The EPIC MOS 
spectrum is consistent with the pn spectrum but is unable to confirm 
the presence of the line due to the smaller photon sample size; the 
Suzaku data show a possible absorption feature, but it was poorly 



constrained compared to the EPIC pn data; and the Chandra data 
were unable to confirm the line detection. 

The lines that gave the largest improvement in the^ 2 fit statis- 
tic were both absorption lines: at 5.9 keV in NGC 3516 (with 
ASCA) and at 7.45 keV in NGC 4151 (with XMM-Newton). The 
former has no published EW, while the latter has a surprisingly low 
\EW\/error ratio gi ven its apparent effect on the fit. However, as 
INandra et a l. (2007) noted, there is some ambiguity over whether 
the NGC 4151 feature should be identified with a line or an edge 
(see their section 8.8.4). 

The Seyfert galaxy NGC 3516 provides one more interesting 
example. iDovciak et al.l J2004I) described a 6.08 keV emission line 
in an XMM-Ne wton EPIC pn spectru m of NGC 35 16 taken in 2001 
April (see also iBianchi et al J 120041) and suggested the feature is 
varied in EW and/or energy. The significance of the feature was 
assessed using an F-test (see section l4~Tt but no uncertainties were 
given on the EW and so the observation is not represented in Fig - 
ureQ] The same observation was analysed bv llwasawa et alj (2004). 
who claimed a periodic modulation in the spectral shape of a broad 
5.6 - 6.5 keV line based on ~ 3 'cycles'. Although this claim is 
intriguing it is not an independent confirmation of the significance 
of the -6.1 keV line; the analysis is an attempt to better understand 
and model the feature reported bv lDovciak et al.l d2004r> . assuming 
its reality and using the same data, not an independent assessment 
of it. 



4.4 The effect of selection and publication bias 

This leaves the possibility that many or most of the line detections 
are false detections caused by random sampling fluctuations. The 
number of false detections may at first sight appear large, but one 
must remember that each spectrum from XMM-Newton, Chandra 
etc. contains > 50 resolution elements, and there are many hun- 
dreds of spectra (especially considering that longer observations 
are routinely split into multiple spectra corresponding to different 
time intervals, flux levels, etc.). The non-detections from each reso- 
lution element of each spectrum contributes an (unpublished) point 
inside (or just above) the shaded region of Fig. Q] The human ana- 
lyst, or an automated search algorithm, has a tendency to focus on 
the largest fluctuations - this is a selection bias. These may then 
be subjected to an hypothesis test, and those satisfying some con- 
ventional criterion (p < a with e.g. a = 0.05) may be chosen for 
publication. The more 'significant' the result (i.e. the smaller p is), 
the more likely it is to be chosen for publication: publication bias. 
These two biases act in the same direction but at different stages in 
the process, and are examples of what Francis Bacon, in his Novum 
Organum, described as the tendency to "notice the events where 
they are fulfilled, but where they fail, tho ugh this happens muc h 
more often, neglect and pass them by" jAriew & Wa tkins 2000). 
The result is that the vastly greater number of null results (whether 
with strong or weak limits) will go largely unpublished, making it 
difficult to estimate the global significance of any individual detec- 
tion. 

The question that needs to be addressed is whether any given 
excess or deficit in a spectrum is unlikely to be a sampling fluctua- 
tion given a large number of spectra each with many resolution el- 
ements. The only systematic attempts to address this specific prob- 
lem are those of Nandra et al. (2007) and Longinotti et al. (2006), 
both of which describe surveys of narrow, shifted lines in samples 
of observations. 

One can make an order of magnitude estimate of the num- 
ber of unpublished non-detections using the following simple argu- 
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ment. Let us assume that > 500 spectra have been examined in the 
last few years and each has > 50 resolution elements, the number 
of independent spectral resolution elements that have been exam- 
ined must be > 2.5 x 10 4 . If the residuals (after fitting a suitable 
continuum model) in each of these is approximately Normally dis- 
tributed, the expected numbers of fluctuations at |z| > 2<x, 3<r and 
4<r are > 1137.5, > 67.5 and > 1.6. The data in Table [TJ record 23 
detections with \EW\I error > 2 and 5 with \EW\I error > 3, given 
that the quoted errors are the half widths of the 90 per cent confi- 
dence interval, these might better correspond to ~ 3<x and ~ 4cx de- 
tectiono suggesting the expected number of large fluctuations is 
consistent with the amount of available data. Furthermore, it seems 
reasonable that lower significance features are less likely to be re- 
ported, and therefore the fraction of unreported ~ 3cr detection 
would be larger than the fraction of unreported ~ 4cr detections, 
hence the lower than expected number of (reported) detections with 
\EW\I error in the range 2-3. Another way to consider this is to 
compare the significance of individual lines, estimated accounting 
for the number of spectral channels, to the number of examined 
spectr a. For example, one of t he bes t detections is in El 82 1+643, 
where [Yaqoob & Serlemitsosl q2005|) estimate a 2 - 3<x detection 
over the entire spectrum, but given > 500 spectra one would expect 
> 22.7 detections at better than 2cr and > 1.3 at b etter than 3a, so 
perhaps this is not unexpected. See lScargld (2000) for a discussion 
of methods to estimate the number of unpublished observations. 



4.5 Post hoc and a priori arguments 

One cannot argue post hoc that there are particular properties of 
individual datasets that allow them to be considered in isolation 1 'I 
For example, in the case of El 82 1+643 one might argue that the 
6.2 keV line was detected in high resolution Chandra HETGS data 
and there are far fewer of these observations, therefore the num- 
ber of (unpublished) non-detections is much smaller. One could 
conceivably construe a similar argument in favour of the unique- 
ness of almost any observation. But if A' observations are analysed 
and subjected to hypothesis tests with a detection threshold of a (it 
does not matter if the tests are the same), the expected number of 
false detections is ~ Na, even if the data come from different target 
sources, missions, detectors etc. 

Similarly, it is not valid to argue that certain time intervals of 
a particular observation should be treated as special unless they are 
selected on the basis of an explicit, prior criterion (i.e. derived and 
used independently of line detection). The expected number of false 
detections scales with the number of tests performed, therefore if 
a long, high quality observation is split into ten time intervals and 
each is examined (if only in a cursory fashion), this has increased 
the effective number of tests approximately tenfold. Additionally, 
one cannot engineer the data slicing to maximise the detection of 
a line, based on the detection of the line in the same data finely 



This is a crude approximation. Interval estimation and hypothesis test- 
ing are different statistical procedures and the relative uncertainty of the 
strength of a line will not in general be simply related to its significance in 
an hypothesis test. 

" The possible exceptions are the lensed BAL quasars, which are excep- 
tional sources for which there is prior knowledge of high velocity outflows 
(from their rest-frame ultraviolet spectra). One might expect stronger ab- 
sorption systems from these faint sources, and so they might reasonably be 
considered separately (they were not included in the calculations performed 
above). 



sliced, and consider this a fair test. This would be to test a hypothe- 
sis suggested by the data, as if the data were independent; one may 
equally well roll a die fifty times, find the most frequent number to 
be rolled and then claim the die is biased because there is only a 
1/6 chance of that number being the most frequent. 

The plausibility of a line detection would be augmented if 
there was cogent prior information on the line properties that was 
confirmed in subsequent observations. For example, a small excess 
in a spectrum at 6.4 keV might be considered a significant detection 
of an iron line, but the same excess at an arbitrary energy might not 
be significant. The coincidence of an excess appearing at the pre- 
dicted energy adds to its plausibility as a real line (and this can be 
included in formal calculations of its 'significance'). However, as 
discussed above, the lines appear at arbitrary energies and are often 
reported to be transient which, if true, makes prediction difficult. 

If real, these lines represent quite extreme physical phenom- 
ena. But, at least in the case of absorption lines, low outflow- 
velocity absorption systems are routinely seen in the X-ray spectra 
of Seyfert galaxies and quasars, and high velocity outflows are ob- 
served in the rest-frame ultraviolet spectra of BAL quasars. Based 
on these one may argue that the existence of higher velocity X-ray 
outflows in other (i.e. non-BAL) sources is at least plausible. By 
contrast the existence of narrow, redshifted absorption, and highly 
red- or blue-shifted emission lines is unprecedented, and it is rea- 
sonable to demand high standards of evidence to support their ex- 
istence. It is perhaps worth noting that the 15 emission lines from 
Table[TJhave a lower average \EW\I error ratio than the absorption 
lines, and none of them has \EW\I error > 2.5 (the 5.6 keV emis- 
sion line in NGC 3516 is excepted for the reasons given above). 



4.6 Final remarks 

Of course, new and exciting discoveries are usually made at the 
limits of the available data, but these must be confirmed at higher 
significance by subsequent observations; when more detections are 
made but the significance does not improve despite more, longer 
observations and brighter targets one should not automatically con- 
sider the discovery confirmed. It is not the objective of this paper 
to argue that any specific detection is false; the argument based on 
Fig.[TJis purely statistical. Indeed, several of the existing detections 
may be genuine - the case is arguably strongest for absorption lines 
with the smallest EW (e.g EW < 30 eV) and largest Ax 2 - but it 
is difficult to explain the tightness of the correlation between de- 
tected line strength and its uncertainty if all or most of the detec- 
tions are genuine. The prevalence and importance of such features 
in the population at large therefore remains an open question. 
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